Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

نویسندگان

  • Masashi Sugiyama
  • Shinichi Nakajima
  • Hisashi Kashima
  • Paul von Bünau
  • Motoaki Kawanabe
چکیده

When training and test samples follow different input distributions (i.e., the situation called covariate shift), the maximum likelihood estimator is known to lose its consistency. For regaining consistency, the log-likelihood terms need to be weighted according to the importance (i.e., the ratio of test and training input densities). Thus, accurately estimating the importance is one of the key tasks in covariate shift adaptation. A naive approach is to first estimate training and test input densities and then estimate the importance by the ratio of the density estimates. However, since density estimation is a hard problem, this approach tends to perform poorly especially in high dimensional cases. In this paper, we propose a direct importance estimation method that does not require the input density estimates. Our method is equipped with a natural model selection procedure so tuning parameters such as the kernel width can be objectively optimized. This is an advantage over a recently developed method of direct importance estimation. Simulations illustrate the usefulness of our approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Covariate Shift Adaptation by Importance Weighted Cross Validation

A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distr...

متن کامل

Stochastic Density Ratio Estimation and Its Application to Feature Selection

In this work, we deal with a relatively new statistical tool in machine learning: the estimation of the ratio of two probability densities, or density ratio estimation for short. As a side piece of research that gained its own traction, we also tackle the task of parameter selection in learning algorithms based on kernel methods. 1 Density Ratio Estimation The estimation of the ratio of two pro...

متن کامل

Continuous Target Shift Adaptation in Supervised Learning

Supervised learning in machine learning concerns inferring an underlying relation between covariate x and target y based on training covariate-target data. It is traditionally assumed that training data and test data, on which the generalization performance of a learning algorithm is measured, follow the same probability distribution. However, this standard assumption is often violated in many ...

متن کامل

Learning under Non-Stationarity: Covariate Shift and Class-Balance Change

One of the fundamental assumptions behind many supervised machine learning algorithms is that training and test data follow the same probability distribution. However, this important assumption is often violated in practice, for example, because of an unavoidable sample selection bias or non-stationarity of the environment. Due to violation of the assumption, standard machine learning methods s...

متن کامل

Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation

Covariate shift is a situation in supervised learning where training and test inputs follow different distributions even though the functional relation remains unchanged. A common approach to compensating for the bias caused by covariate shift is to reweight the training samples according to importance, which is the ratio of test and training densities. We propose a novel method that allows us ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007